#Hey look its penguins
##Here’s a look at the palmer penguins data set:
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex <fct> male, female, female, NA, female, male, female, male…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
The steps of data science are:
We don’t need to do any real cleaning for this data set. But we can show some pretty pictures…
Here, we see that male and female penguins have distinct body mass distributions for each species of penguin. Don’t believe your eyes?
Here are some stats:
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 3368.8356 | 36.21222 | 93.030365 | 0.0000000 |
| sexmale | 674.6575 | 51.21181 | 13.173867 | 0.0000000 |
| speciesChinstrap | 158.3703 | 64.24029 | 2.465279 | 0.0142039 |
| speciesGentoo | 1310.9058 | 54.42228 | 24.087666 | 0.0000000 |
| sexmale:speciesChinstrap | -262.8928 | 90.84950 | -2.893718 | 0.0040627 |
| sexmale:speciesGentoo | 130.4372 | 76.43559 | 1.706498 | 0.0888649 |
The equation for our model is:
\[ E( \operatorname{body_mass_g} ) = \alpha + \beta_{1}(\operatorname{sex}_{\operatorname{male}}) + \beta_{2}(\operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{3}(\operatorname{species}_{\operatorname{Gentoo}}) + \beta_{4}(\operatorname{sex}_{\operatorname{male}} \times \operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{5}(\operatorname{sex}_{\operatorname{male}} \times \operatorname{species}_{\operatorname{Gentoo}}) \]